{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_08_3_keras_hyperparameters.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "**Module 8: Kaggle Data Sets**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 8 Material\n",
    "\n",
    "* Part 8.1: Introduction to Kaggle [[Video]](https://www.youtube.com/watch?v=v4lJBhdCuCU&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_08_1_kaggle_intro.ipynb)\n",
    "* Part 8.2: Building Ensembles with Scikit-Learn and Keras [[Video]](https://www.youtube.com/watch?v=LQ-9ZRBLasw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_08_2_keras_ensembles.ipynb)\n",
    "* **Part 8.3: How Should you Architect Your Keras Neural Network: Hyperparameters** [[Video]](https://www.youtube.com/watch?v=1q9klwSoUQw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_08_3_keras_hyperparameters.ipynb)\n",
    "* Part 8.4: Bayesian Hyperparameter Optimization for Keras [[Video]](https://www.youtube.com/watch?v=sXdxyUCCm8s&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_08_4_bayesian_hyperparameter_opt.ipynb)\n",
    "* Part 8.5: Current Semester's Kaggle [[Video]](https://www.youtube.com/watch?v=PHQt0aUasRg&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_08_5_kaggle_project.ipynb)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running the correct version of TensorFlow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: not using Google CoLab\n"
     ]
    }
   ],
   "source": [
    "# Startup CoLab\n",
    "try:\n",
    "    %tensorflow_version 2.x\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False\n",
    "\n",
    "\n",
    "# Nicely formatted time string\n",
    "def hms_string(sec_elapsed):\n",
    "    h = int(sec_elapsed / (60 * 60))\n",
    "    m = int((sec_elapsed % (60 * 60)) / 60)\n",
    "    s = sec_elapsed % 60\n",
    "    return \"{}:{:>02}:{:>05.2f}\".format(h, m, s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 8.3: Architecting Network: Hyperparameters\n",
    "\n",
    "You have probably noticed several hyperparameters introduced previously in this course that you need to choose for your neural network. The number of layers, neuron counts per layer, layer types, and activation functions are all choices you must make to optimize your neural network. Some of the categories of hyperparameters for you to choose from coming from the following list:\n",
    "\n",
    "* Number of Hidden Layers and Neuron Counts\n",
    "* Activation Functions\n",
    "* Advanced Activation Functions\n",
    "* Regularization: L1, L2, Dropout\n",
    "* Batch Normalization\n",
    "* Training Parameters\n",
    "\n",
    "The following sections will introduce each of these categories for Keras. While I will provide some general guidelines for hyperparameter selection, no two tasks are the same. You will benefit from experimentation with these values to determine what works best for your neural network. In the next part, we will see how machine learning can select some of these values independently.\n",
    "\n",
    "## Number of Hidden Layers and Neuron Counts\n",
    "\n",
    "The structure of Keras layers is perhaps the hyperparameters that most become aware of first. How many layers should you have? How many neurons are on each layer? What activation function and layer type should you use? These are all questions that come up when designing a neural network. There are many different [types of layer](https://keras.io/layers/core/) in Keras, listed here:\n",
    "\n",
    "* **Activation** - You can also add activation functions as layers.  Using the activation layer is the same as specifying the activation function as part of a Dense (or other) layer type.\n",
    "* **ActivityRegularization** Used to add L1/L2 regularization outside of a layer. You can specify L1 and L2 as part of a Dense (or other) layer type.\n",
    "* **Dense** - The original neural network layer type. In this layer type, every neuron connects to the next layer. The input vector is one-dimensional, and placing specific inputs next does not affect each other. \n",
    "* **Dropout** - Dropout consists of randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting. Dropout only occurs during training.\n",
    "* **Flatten** - Flattens the input to 1D and does not affect the batch size.\n",
    "* **Input** - A Keras tensor is a tensor object from the underlying back end (Theano, TensorFlow, or CNTK), which we augment with specific attributes to build a Keras by knowing the inputs and outputs of the model.\n",
    "* **Lambda** - Wraps arbitrary expression as a Layer object.\n",
    "* **Masking** - Masks a sequence using a mask value to skip timesteps.\n",
    "* **Permute** - Permutes the input dimensions according to a given pattern. Useful for tasks such as connecting RNNs and convolutional networks.\n",
    "* **RepeatVector** - Repeats the input n times.\n",
    "* **Reshape** - Similar to Numpy reshapes.\n",
    "* **SpatialDropout1D** - This version performs the same function as Dropout; however, it drops entire 1D feature maps instead of individual elements. \n",
    "* **SpatialDropout2D** - This version performs the same function as Dropout; however, it drops entire 2D feature maps instead of individual elements\n",
    "* **SpatialDropout3D** - This version performs the same function as Dropout; however, it drops entire 3D feature maps instead of individual elements. \n",
    "\n",
    "There is always trial and error for choosing a good number of neurons and hidden layers. Generally, the number of neurons on each layer will be larger closer to the hidden layer and smaller towards the output layer. This configuration gives the neural network a somewhat triangular or trapezoid appearance.\n",
    "\n",
    "## Activation Functions\n",
    "\n",
    "Activation functions are a choice that you must make for each layer. Generally, you can follow this guideline:\n",
    "* Hidden Layers - RELU\n",
    "* Output Layer - Softmax for classification, linear for regression.\n",
    "\n",
    "Some of the common activation functions in Keras are listed here:\n",
    "\n",
    "* **softmax** - Used for multi-class classification.  Ensures all output neurons behave as probabilities and sum to 1.0.\n",
    "* **elu** - Exponential linear unit.  Exponential Linear Unit or its widely known name ELU is a function that tends to converge cost to zero faster and produce more accurate results. Can produce negative outputs.\n",
    "* **selu** - Scaled Exponential Linear Unit (SELU), essentially **elu** multiplied by a scaling constant.\n",
    "* **softplus** - Softplus activation function. $log(exp(x) + 1)$  [Introduced](https://papers.nips.cc/paper/1920-incorporating-second-order-functional-knowledge-for-better-option-pricing.pdf) in 2001.\n",
    "* **softsign** Softsign activation function. $x / (abs(x) + 1)$ Similar to tanh, but not widely used.\n",
    "* **relu** - Very popular neural network activation function.  Used for hidden layers, cannot output negative values. No trainable parameters.\n",
    "* **tanh** Classic neural network activation function, though often replaced by relu family on modern networks.\n",
    "* **sigmoid** - Classic neural network activation.  Often used on output layer of a binary classifier.\n",
    "* **hard_sigmoid** - Less computationally expensive variant of sigmoid.\n",
    "* **exponential** - Exponential (base e) activation function.\n",
    "* **linear** - Pass-through activation function. Usually used on the output layer of a regression neural network.\n",
    "\n",
    "For more information about Keras activation functions refer to the following:\n",
    "\n",
    "* [Keras Activation Functions](https://keras.io/activations/)\n",
    "* [Activation Function Cheat Sheets](https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html)\n",
    "\n",
    "\n",
    "### Advanced Activation Functions\n",
    "\n",
    "Hyperparameters are not changed when the neural network trains. You, the network designer, must define the hyperparameters. The neural network learns regular parameters during neural network training. Neural network weights are the most common type of regular parameter. The \"[advanced activation functions](https://keras.io/layers/advanced-activations/),\" as Keras call them, also contain parameters that the network will learn during training. These activation functions may give you better performance than RELU.\n",
    "\n",
    "* **LeakyReLU** - Leaky version of a Rectified Linear Unit. It allows a small gradient when the unit is not active, controlled by alpha hyperparameter.\n",
    "* **PReLU** - Parametric Rectified Linear Unit, learns the alpha hyperparameter. \n",
    "\n",
    "## Regularization: L1, L2, Dropout\n",
    "\n",
    "\n",
    "\n",
    "* [Keras Regularization](https://keras.io/regularizers/)\n",
    "* [Keras Dropout](https://keras.io/layers/core/)\n",
    "\n",
    "## Batch Normalization\n",
    "\n",
    "* [Keras Batch Normalization](https://keras.io/layers/normalization/)\n",
    "\n",
    "* Ioffe, S., & Szegedy, C. (2015). [Batch normalization: Accelerating deep network training by reducing internal covariate shift](https://arxiv.org/abs/1502.03167). *arXiv preprint arXiv:1502.03167*.\n",
    "\n",
    "Normalize the activations of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1. Can allow learning rate to be larger.\n",
    "\n",
    "\n",
    "## Training Parameters\n",
    "\n",
    "* [Keras Optimizers](https://keras.io/optimizers/)\n",
    "\n",
    "* **Batch Size** - Usually small, such as 32 or so.\n",
    "* **Learning Rate**  - Usually small, 1e-3 or so.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3.9 (tensorflow)",
   "language": "python",
   "name": "tensorflow"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
